Distributed Sequential Pattern Mining in Large Scale Uncertain Databases

نویسندگان

  • Jiaqi Ge
  • Yuni Xia
چکیده

While sequential pattern mining (SPM) is an import application in uncertain databases, it is challenging in efficiency and scalability. In this paper, we develop a dynamic programming (DP) approach to mine probabilistic frequent sequential patterns in distributed computing platform Spark. Directly applying the DP method to Spark is impractical because its memory-consuming characteristic may cause heavy JVM garbage collection overhead in Spark. Therefore, we design a memoryefficient distributed DP approach and use an extended prefix-tree to save intermediate results efficiently. The extensive experimental results in various scales prove that our method is orders of magnitude faster than straight-forward approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Uncertain Sequential Patterns in Iterative MapReduce

This paper proposes a sequential pattern mining (SPM) algorithm in large scale uncertain databases. Uncertain sequence databases are widely used to model inaccurate or imprecise timestamped data in many real applications, where traditional SPM algorithms are inapplicable because of data uncertainty and scalability. In this paper, we develop an efficient approach to manage data uncertainty in SP...

متن کامل

Distributed Sequential Pattern Mining: A Survey and Future Scope

Distributed sequential pattern mining is the data mining method to discover sequential patterns from large sequential database on distributed environment. It is used in many wide applications including web mining, customer shopping record, biomedical analysis, scientific research, etc. A large research has been done on sequential pattern mining on various distributed environments like Grid, Had...

متن کامل

Progressive CFM-Miner: An Algorithm to Mine CFM - Sequential Patterns from a Progressive Database

Sequential pattern mining is a vital data mining task to discover the frequently occurring patterns in sequence databases. As databases develop, the problem of maintaining sequential patterns over an extensively long period of time turn into essential, since a large number of new records may be added to a database. To reflect the current state of the database where previous sequential patterns ...

متن کامل

Abstract—Mining Sequential Patterns in large databases has become

Mining Sequential Patterns in large databases has become an important data mining task with broad applications. It is an important task in data mining field, which describes potential sequenced relationships among items in a database. There are many different algorithms introduced for this task. Conventional algorithms can find the exact optimal Sequential Pattern rule but it takes a long time,...

متن کامل

Hybrid Technique for Frequent Pattern Extraction from Sequential Database

Data mining has became a familiar tool for mining stored value from the large scale databases that are known as Sequential Database. These databases has large number of itemsets that can arrive frequently and sequentially, it can also predict the users behaviors. The evaluation of user behavior is done by using Sequential pattern mining where the frequent patterns extracted with several limitat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016